Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-38607711

RESUMEN

3D dense captioning requires a model to translate its understanding of an input 3D scene into several captions associated with different object regions. Existing methods adopt a sophisticated "detect-then-describe" pipeline, which builds explicit relation modules upon a 3D detector with numerous hand-crafted components. While these methods have achieved initial success, the cascade pipeline tends to accumulate errors because of duplicated and inaccurate box estimations and messy 3D scenes. In this paper, we first propose Vote2Cap-DETR, a simple-yet-effective transformer framework that decouples the decoding process of caption generation and object localization through parallel decoding. Moreover, we argue that object localization and description generation require different levels of scene understanding, which could be challenging for a shared set of queries to capture. To this end, we propose an advanced version, Vote2Cap-DETR++, which decouples the queries into localization and caption queries to capture task-specific features. Additionally, we introduce the iterative spatial refinement strategy to vote queries for faster convergence and better localization performance. We also insert additional spatial information to the caption head for more accurate descriptions. Without bells and whistles, extensive experiments on two commonly used datasets, ScanRefer and Nr3D, demonstrate Vote2Cap-DETR and Vote2Cap-DETR++ surpass conventional "detect-then-describe" methods by a large margin. We have made the code available at https://github.com/ch3cook-fdu/Vote2Cap-DETR.

2.
Brain Sci ; 13(10)2023 Oct 19.
Artículo en Inglés | MEDLINE | ID: mdl-37891850

RESUMEN

BACKGROUND: The prognosis of diffuse midline glioma (DMG) patients with H3K27M (H3K27M-DMG) alterations is poor; however, a model that encourages accurate prediction of prognosis for such lesions on an individual basis remains elusive. We aimed to construct an H3K27M-DMG survival model based on DeepSurv to predict patient prognosis. METHODS: Patients recruited from a single center were used for model training, and patients recruited from another center were used for external validation. Univariate and multivariate Cox regression analyses were used to select features. Four machine learning models were constructed, and the consistency index (C-index) and integrated Brier score (IBS) were calculated. We used the receiver operating characteristic curve (ROC) and area under the receiver operating characteristic (AUC) curve to assess the accuracy of predicting 6-month, 12-month, 18-month and 24-month survival rates. A heatmap of feature importance was used to explain the results of the four models. RESULTS: We recruited 113 patients in the training set and 23 patients in the test set. We included tumor size, tumor location, Karnofsky Performance Scale (KPS) score, enhancement, radiotherapy, and chemotherapy for model training. The accuracy of DeepSurv prediction is highest among the four models, with C-indexes of 0.862 and 0.811 in the training and external test sets, respectively. The DeepSurv model had the highest AUC values at 6 months, 12 months, 18 months and 24 months, which were 0.970 (0.919-1), 0.950 (0.877-1), 0.939 (0.845-1), and 0.875 (0.690-1), respectively. We designed an interactive interface to more intuitively display the survival probability prediction results provided by the DeepSurv model. CONCLUSION: The DeepSurv model outperforms traditional machine learning models in terms of prediction accuracy and robustness, and it can also provide personalized treatment recommendations for patients. The DeepSurv model may provide decision-making assistance for patients in formulating treatment plans in the future.

3.
IEEE Trans Image Process ; 32: 5423-5437, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37773910

RESUMEN

We propose a weakly supervised approach for salient object detection from multi-modal RGB-D data. Our approach only relies on labels from scribbles, which are much easier to annotate, compared with dense labels used in conventional fully supervised setting. In contrast to existing methods that employ supervision signals on the output space, our design regularizes the intermediate latent space to enhance discrimination between salient and non-salient objects. We further introduce a contour detection branch to implicitly constrain the semantic boundaries and achieve precise edges of detected salient objects. To enhance the long-range dependencies among local features, we introduce a Cross-Padding Attention Block (CPAB). Extensive experiments on seven benchmark datasets demonstrate that our method not only outperforms existing weakly supervised methods, but is also on par with several fully-supervised state-of-the-art models. Code is available at https://github.com/leolyj/DHFR-SOD.

4.
Artículo en Inglés | MEDLINE | ID: mdl-37506014

RESUMEN

Learning from a sequence of tasks for a lifetime is essential for an agent toward artificial general intelligence. Despite the explosion of this research field in recent years, most work focuses on the well-known catastrophic forgetting issue. In contrast, this work aims to explore knowledge-transferable lifelong learning without storing historical data and significant additional computational overhead. We demonstrate that existing data-free frameworks, including regularization-based single-network and structure-based multinetwork frameworks, face a fundamental issue of lifelong learning, named anterograde forgetting, i.e., preserving and transferring memory may inhibit the learning of new knowledge. We attribute it to the fact that the learning network capacity decreases while memorizing historical knowledge and conceptual confusion between the irrelevant old knowledge and the current task. Inspired by the complementary learning theory in neuroscience, we endow artificial neural networks with the ability to continuously learn without forgetting while recalling historical knowledge to facilitate learning new knowledge. Specifically, this work proposes a general framework named cycle memory networks (CMNs). The CMN consists of two individual memory networks to store short-and long-term memories separately to avoid capacity shrinkage and a transfer cell between them. It enables knowledge transfer from the long-term to the short-term memory network to mitigate conceptual confusion. In addition, the memory consolidation mechanism integrates short-term knowledge into the long-term memory network for knowledge accumulation. We demonstrate that the CMN can effectively address the anterograde forgetting on several task-related, task-conflict, class-incremental, and cross-domain benchmarks. Furthermore, we provide extensive ablation studies to verify each framework component. The source codes are available at: https://github.com/GeoX-Lab/CMN.

5.
Cancer Med ; 12(16): 17139-17148, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37461358

RESUMEN

BACKGROUND: H3K27M mutation status significantly affects the prognosis of patients with diffuse midline gliomas (DMGs), but this tumor presents a high risk of pathological acquisition. We aimed to construct a fully automated model for predicting the H3K27M alteration status of DMGs based on deep learning using whole-brain MRI. METHODS: DMG patients from West China Hospital of Sichuan University (WCHSU; n = 200) and Chengdu Shangjin Nanfu Hospital (CSNH; n = 35) who met the inclusion and exclusion criteria from February 2016 to April 2022 were enrolled as the training and external test sets, respectively. To adapt the model to the human head MRI scene, we use normal human head MR images to pretrain the model. The classification and tumor segmentation tasks are naturally related, so we conducted cotraining for the two tasks to enable information interaction between them and improve the accuracy of the classification task. RESULTS: The average classification accuracies of our model on the training and external test sets was 90.5% and 85.1%, respectively. Ablation experiments showed that pretraining and cotraining could improve the prediction accuracy and generalization performance of the model. In the training and external test sets, the average areas under the receiver operating characteristic curve (AUROCs) were 94.18% and 87.64%, and the average areas under the precision-recall curve (AUPRC) were 93.26% and 85.4%. CONCLUSIONS: The developed model achieved excellent performance in predicting the H3K27M alteration status in DMGs, and its good reproducibility and generalization were verified in the external dataset.


Asunto(s)
Aprendizaje Profundo , Glioma , Humanos , Reproducibilidad de los Resultados , Imagen por Resonancia Magnética , Encéfalo , Glioma/diagnóstico por imagen , Glioma/genética
6.
Sci Rep ; 13(1): 9970, 2023 06 20.
Artículo en Inglés | MEDLINE | ID: mdl-37340065

RESUMEN

H3 K27M-mutant diffuse midline glioma (H3 K27M-mt DMG) is a rare, highly invasive tumor with a poor prognosis. The prognostic factors of H3 K27M-mt DMG have not been fully identified, and there is no clinical prediction model for it. This study aimed to develop and validate a prognostic model for predicting the probability of survival in patients with H3 K27M-mt DMG. Patients diagnosed with H3 K27M-mt DMG in the West China Hospital from January 2016 to August 2021 were included. Cox proportional hazard regression was used for survival assessment, with adjustment for known prognostic factors. The final model was established using the patient data of our center as the training cohort and data from other centers for external independent verification. One hundred and five patients were ultimately included in the training cohort, and 43 cases from another institution were used as the validation cohort. The factors influencing survival probability in the prediction model included age, preoperative KPS score, radiotherapy and Ki-67 expression level. The adjusted consistency indices of the Cox regression model in internal bootstrap validation at 6, 12, and 18 months were 0.776, 0.766, and 0.764, respectively. The calibration chart showed high consistency between the predicted and observed results. The discrimination in external verification was 0.785, and the calibration curve showed good calibration ability. We identified the risk factors that affect the prognosis of H3 K27M-mt DMG patients and then established and validated a diagnostic model for predicting the survival probability of these patients.


Asunto(s)
Astrocitoma , Neoplasias Encefálicas , Glioma , Humanos , Pronóstico , Neoplasias Encefálicas/patología , Glioma/diagnóstico , Glioma/genética , Glioma/patología , Histonas/genética , Nomogramas , Mutación
7.
IEEE Trans Neural Netw Learn Syst ; 33(9): 4243-4256, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-33577459

RESUMEN

Enabling a neural network to sequentially learn multiple tasks is of great significance for expanding the applicability of neural networks in real-world applications. However, artificial neural networks face the well-known problem of catastrophic forgetting. What is worse, the degradation of previously learned skills becomes more severe as the task sequence increases, known as the long-term catastrophic forgetting. It is due to two facts: first, as the model learns more tasks, the intersection of the low-error parameter subspace satisfying for these tasks becomes smaller or even does not exist; second, when the model learns a new task, the cumulative error keeps increasing as the model tries to protect the parameter configuration of previous tasks from interference. Inspired by the memory consolidation mechanism in mammalian brains with synaptic plasticity, we propose a confrontation mechanism in which Adversarial Neural Pruning and synaptic Consolidation (ANPyC) is used to overcome the long-term catastrophic forgetting issue. The neural pruning acts as long-term depression to prune task-irrelevant parameters, while the novel synaptic consolidation acts as long-term potentiation to strengthen task-relevant parameters. During the training, this confrontation achieves a balance in that only crucial parameters remain, and non-significant parameters are freed to learn subsequent tasks. ANPyC avoids forgetting important information and makes the model efficient to learn a large number of tasks. Specifically, the neural pruning iteratively relaxes the current task's parameter conditions to expand the common parameter subspace of the task; the synaptic consolidation strategy, which consists of a structure-aware parameter-importance measurement and an element-wise parameter updating strategy, decreases the cumulative error when learning new tasks. Our approach encourages the synapse to be sparse and polarized, which enables long-term learning and memory. ANPyC exhibits effectiveness and generalization on both image classification and generation tasks with multiple layer perceptron, convolutional neural networks, and generative adversarial networks, and variational autoencoder. The full source code is available at https://github.com/GeoX-Lab/ANPyC.


Asunto(s)
Aprendizaje , Redes Neurales de la Computación , Animales , Encéfalo , Aprendizaje Automático , Mamíferos , Plasticidad Neuronal
8.
IEEE Trans Image Process ; 30: 6594-6608, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34270425

RESUMEN

Semantic segmentation is a crucial image understanding task, where each pixel of image is categorized into a corresponding label. Since the pixel-wise labeling for ground-truth is tedious and labor intensive, in practical applications, many works exploit the synthetic images to train the model for real-word image semantic segmentation, i.e., Synthetic-to-Real Semantic Segmentation (SRSS). However, Deep Convolutional Neural Networks (CNNs) trained on the source synthetic data may not generalize well to the target real-world data. To address this problem, there has been rapidly growing interest in Domain Adaption technique to mitigate the domain mismatch between the synthetic and real-world images. Besides, Domain Generalization technique is another solution to handle SRSS. In contrast to Domain Adaption, Domain Generalization seeks to address SRSS without accessing any data of the target domain during training. In this work, we propose two simple yet effective texture randomization mechanisms, Global Texture Randomization (GTR) and Local Texture Randomization (LTR), for Domain Generalization based SRSS. GTR is proposed to randomize the texture of source images into diverse unreal texture styles. It aims to alleviate the reliance of the network on texture while promoting the learning of the domain-invariant cues. In addition, we find the texture difference is not always occurred in entire image and may only appear in some local areas. Therefore, we further propose a LTR mechanism to generate diverse local regions for partially stylizing the source images. Finally, we implement a regularization of Consistency between GTR and LTR (CGL) aiming to harmonize the two proposed mechanisms during training. Extensive experiments on five publicly available datasets (i.e., GTA5, SYNTHIA, Cityscapes, BDDS and Mapillary) with various SRSS settings (i.e., GTA5/SYNTHIA to Cityscapes/BDDS/Mapillary) demonstrate that the proposed method is superior to the state-of-the-art methods for domain generalization based SRSS.

9.
Artículo en Inglés | MEDLINE | ID: mdl-34280090

RESUMEN

Image smoothing is a fundamental procedure in applications of both computer vision and graphics. The required smoothing properties can be different or even contradictive among different tasks. Nevertheless, the inherent smoothing nature of one smoothing operator is usually fixed and thus cannot meet the various requirements of different applications. In this paper, we first introduce the truncated Huber penalty function which shows strong flexibility under different parameter settings. A generalized framework is then proposed with the introduced truncated Huber penalty function. When combined with its strong flexibility, our framework is able to achieve diverse smoothing natures where contradictive smoothing behaviors can even be achieved. It can also yield the smoothing behavior that can seldom be achieved by previous methods, and superior performance is thus achieved in challenging cases. These together enable our framework capable of a range of applications and able to outperform the state-of-the-art approaches in several tasks. In addition, an efficient numerical solution is provided and its convergence is theoretically guaranteed even the optimization framework is non-convex and non-smooth. A simple yet effective approach is further proposed to reduce the computational cost of our method while maintaining its performance. The effectiveness and superior performance of our approach are validated through comprehensive experiments in a range of applications.

10.
IEEE Trans Image Process ; 30: 3204-3216, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33621174

RESUMEN

In recent years, Salient Object Detection (SOD) has shown great success with the achievements of large-scale benchmarks and deep learning techniques. However, existing SOD methods mainly focus on natural images with low-resolutions, e.g., 400×400 or less. This drawback hinders them for advanced practical applications, which need high-resolution, detail-aware results. Besides, lacking of the boundary detail and semantic context of salient objects is also a key concern for accurate SOD. To address these issues, in this work we focus on the High-Resolution Salient Object Detection (HRSOD) task. Technically, we propose the first end-to-end learnable framework, named Dual ReFinement Network (DRFNet), for fully automatic HRSOD. More specifically, the proposed DRFNet consists of a shared feature extractor and two effective refinement heads. By decoupling the detail and context information, one refinement head adopts a global-aware feature pyramid. Without increasing too much computational burden, it can boost the spatial detail information, which narrows the gap between high-level semantics and low-level details. In parallel, the other refinement head adopts hybrid dilated convolutional blocks and group-wise upsamplings, which are very efficient in extracting contextual information. Based on the dual refinements, our approach can enlarge receptive fields and obtain more discriminative features from high-resolution images. Experimental results on high-resolution benchmarks (the public DUT-HRSOD and the proposed DAVIS-SOD) demonstrate that our method is not only efficient but also performs more accurate than other state-of-the-arts. Besides, our method generalizes well on typical low-resolution benchmarks.

11.
IEEE Trans Image Process ; 30: 55-67, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33125327

RESUMEN

Street Scene Change Detection (SSCD) aims to locate the changed regions between a given street-view image pair captured at different times, which is an important yet challenging task in the computer vision community. The intuitive way to solve the SSCD task is to fuse the extracted image feature pairs, and then directly measure the dissimilarity parts for producing a change map. Therefore, the key for the SSCD task is to design an effective feature fusion method that can improve the accuracy of the corresponding change maps. To this end, we present a novel Hierarchical Paired Channel Fusion Network (HPCFNet), which utilizes the adaptive fusion of paired feature channels. Specifically, the features of a given image pair are jointly extracted by a Siamese Convolutional Neural Network (SCNN) and hierarchically combined by exploring the fusion of channel pairs at multiple feature levels. In addition, based on the observation that the distribution of scene changes is diverse, we further propose a Multi-Part Feature Learning (MPFL) strategy to detect diverse changes. Based on the MPFL strategy, our framework achieves a novel approach to adapt to the scale and location diversities of the scene change regions. Extensive experiments on three public datasets (i.e., PCD, VL-CMU-CD and CDnet2014) demonstrate that the proposed framework achieves superior performance which outperforms other state-of-the-art methods with a considerable margin.


Asunto(s)
Aprendizaje Profundo , Procesamiento de Imagen Asistido por Computador/métodos , Bases de Datos Factuales , Redes Neurales de la Computación , Grabación en Video
12.
Artículo en Inglés | MEDLINE | ID: mdl-32167895

RESUMEN

Street Scene Parsing (SSP) is a fundamental and important step for autonomous driving and traffic scene understanding. Recently, Fully Convolutional Network (FCN) based methods have delivered expressive performances with the help of large-scale dense-labeling datasets. However, in urban traffic environments, not all the labels contribute equally for making the control decision. Certain labels such as pedestrian, car, bicyclist, road lane or sidewalk would be more important in comparison with labels for vegetation, sky or building. Based on this fact, in this paper we propose a novel deep learning framework, named Residual Atrous Pyramid Network (RAPNet), for importance-aware SSP. More specifically, to incorporate the importance of various object classes, we propose an Importance-Aware Feature Selection (IAFS) mechanism which automatically selects the important features for label predictions. The IAFS can operate in each convolutional block, and the semantic features with different importance are captured in different channels so that they are automatically assigned with corresponding weights. To enhance the labeling coherence, we also propose a Residual Atrous Spatial Pyramid (RASP) module to sequentially aggregate global-to-local context information in a residual refinement manner. Extensive experiments on two public benchmarks have shown that our approach achieves new state-of-the-art performances, and can consistently obtain more accurate results on the semantic classes with high importance levels.

13.
Artículo en Inglés | MEDLINE | ID: mdl-32086208

RESUMEN

Recently, Fully Convolutional Network (FCN) seems to be the go-to architecture for image segmentation, including semantic scene parsing. However, it is difficult for a generic FCN to predict semantic labels around the object boundaries, thus FCN-based methods usually produce parsing results with inaccurate boundaries. Meanwhile, many works have demonstrate that level set based active contours are superior to the boundary estimation in sub-pixel accuracy. However, they are quite sensitive to initial settings. To address these limitations, in this paper we propose a novel Deep Multiphase Level Set (DMLS) method for semantic scene parsing, which efficiently incorporates multiphase level sets into deep neural networks. The proposed method consists of three modules, i.e., recurrent FCNs, adaptive multiphase level set, and deeply supervised learning. More specifically, recurrent FCNs learn multi-level representations of input images with different contexts. Adaptive multiphase level set drives the discriminative contour for each semantic class, which makes use of the advantages of both global and local information. In each time-step of the recurrent FCNs, deeply supervised learning is incorporated for model training. Extensive experiments on three public benchmarks have shown that our proposed method achieves new state-of-the-art performances. The source codes will be released at https://github.com/Pchank/DMLS-for-SSP.

14.
IEEE Trans Image Process ; 24(11): 4502-11, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26208344

RESUMEN

A random submatrix method (RSM) is proposed to calculate the low-rank decomposition U(m×r)V(n×r)(T) (r < m, n) of the matrix Y∈R(m×n) (assuming m > n generally) with known entry percentage 0 < ρ ≤ 1. RSM is very fast as only O(mr(2)ρ(r)) or O(n(3)ρ(3r)) floating-point operations (flops) are required, compared favorably with O(mnr+r(2)(m+n)) flops required by the state-of-the-art algorithms. Meanwhile, RSM has the advantage of a small memory requirement as only max(n(2),mr+nr) real values need to be saved. With the assumption that known entries are uniformly distributed in Y, submatrices formed by known entries are randomly selected from Y with statistical size k×nρ(k) or mρ(l)×l , where k or l takes r+1 usually. We propose and prove a theorem, under random noises the probability that the subspace associated with a smaller singular value will turn into the space associated to anyone of the r largest singular values is smaller. Based on the theorem, the nρ(k)-k null vectors or the l-r right singular vectors associated with the minor singular values are calculated for each submatrix. The vectors ought to be the null vectors of the submatrix formed by the chosen nρ(k) or l columns of the ground truth of V(T). If enough submatrices are randomly chosen, V and U can be estimated accordingly. The experimental results on random synthetic matrices with sizes such as 13 1072 ×10(24) and on real data sets such as dinosaur indicate that RSM is 4.30 ∼ 197.95 times faster than the state-of-the-art algorithms. It, meanwhile, has considerable high precision achieving or approximating to the best.

15.
Sensors (Basel) ; 14(12): 24156-73, 2014 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-25517694

RESUMEN

Recognizing 3D objects from point clouds in the presence of significant clutter and occlusion is a highly challenging task. In this paper, we present a coarse-to-fine 3D object recognition algorithm. During the phase of offline training, each model is represented with a set of multi-scale local surface features. During the phase of online recognition, a set of keypoints are first detected from each scene. The local surfaces around these keypoints are further encoded with multi-scale feature descriptors. These scene features are then matched against all model features to generate recognition hypotheses, which include model hypotheses and pose hypotheses. Finally, these hypotheses are verified to produce recognition results. The proposed algorithm was tested on two standard datasets, with rigorous comparisons to the state-of-the-art algorithms. Experimental results show that our algorithm was fully automatic and highly effective. It was also very robust to occlusion and clutter. It achieved the best recognition performance on all of these datasets, showing its superiority compared to existing algorithms.

16.
J Opt Soc Am A Opt Image Sci Vis ; 31(5): 981-95, 2014 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-24979630

RESUMEN

This work presents a novel computed tomography (CT) reconstruction method for the few-view problem based on fractional calculus. To overcome the disadvantages of the total variation minimization method, we propose a fractional-order total variation-based image reconstruction method in this paper. The presented model adopts fractional-order total variation instead of traditional total variation. Different from traditional total variation, fractional-order total variation is derived by considering more neighboring image voxels such that the corresponding weights can be adaptively determined by the model, thus suppressing the over-smoothing effect. The discretization scheme of the fractional-order model is also given. Numerical and clinical experiments demonstrate that our method achieves better performance than existing reconstruction methods, including filtered back projection (FBP), the total variation-based projections onto convex sets method (TV-POCS), and soft-threshold filtering (STH).


Asunto(s)
Algoritmos , Intensificación de Imagen Radiográfica/métodos , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Procesamiento de Señales Asistido por Computador , Tomografía Computarizada por Rayos X/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...